Bug fixed for DIANN by ypriverol · Pull Request #600 · bigbio/pmultiqc

ypriverol · 2026-02-06T11:05:32Z

Pull Request

Description

Brief description of the changes made in this PR.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Performance improvement
Code refactoring
Test addition/update
Updates to the dependencies has been done.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved data validation to gracefully handle missing data without crashing.
- Added automatic fallback logic for intensity column selection when preferred data is unavailable.
- Visualization reports now skip rendering when required data is insufficient, with appropriate warnings logged.
Style
- Updated visualization control configurations for enhanced chart interactivity.

Enhance percentage display

This fixes issue #598 where DIA-NN data processing would fail with certain datasets like PXD063291. Changes: 1. Fixed sdrf_pipelines import paths for compatibility with newer versions (0.1.0+) while maintaining backward compatibility with older versions. The module structure changed from sdrf_pipelines.openms to sdrf_pipelines.converters.openms. 2. Added robust column existence checks in dia_utils.py: - _prepare_quant_table_data: Now checks for required columns and falls back to Precursor.Quantity if Precursor.Normalised is missing - create_peptides_table: Handles missing Q.Value column gracefully - create_protein_table: Uses correct intensity column dynamically - draw_diann_quant_table: Skips tables gracefully when data is missing - _merge_condition_data: Uses correct intensity column for merging 3. Improved error handling to log warnings instead of raising exceptions when optional data is missing, allowing reports to complete with available data. https://claude.ai/code/session_01VrgNLeqdhEHweocVpUmcRT

Keep using the original import paths from sdrf_pipelines (pre-0.1.0 version): - sdrf_pipelines.openms.openms.UnimodDatabase - sdrf_pipelines.openms.openms.OpenMS The column check fixes for DIA-NN processing remain in place. https://claude.ai/code/session_01VrgNLeqdhEHweocVpUmcRT

Added column existence checks to prevent crashes when DIA-NN reports are missing expected columns: - _process_diann_statistics: checks for Protein.Group, Modified.Sequence - _process_peptide_search_scores: checks for Modified.Sequence, Q.Value - _process_modifications: checks for Modified.Sequence - _process_run_data: checks for Run, Modified.Sequence, Modifications, Protein.Group - draw_dia_ids_rt: checks for Run, RT Functions now return safe default values and log warnings when columns are missing, allowing reports to complete with available data. https://claude.ai/code/session_01VrgNLeqdhEHweocVpUmcRT

Add robust error handling and fallback logic to DIA-NN module

coderabbitai · 2026-02-06T11:06:07Z

Warning

Rate limit exceeded

@ypriverol has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 56 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

This PR adds input validation and defensive returns to DIA-NN report processing, introduces intensity column handling with fallback logic across data structures, and updates CPSwitch configuration flags in plotting functions to improve data integrity and control flow.

Changes

Cohort / File(s)	Summary
DIA Utility Processing `pmultiqc/modules/common/dia_utils.py`	Added validation guards to _process_diann_statistics, _process_peptide_search_scores, _process_modifications, and _process_run_data to return early with placeholders when required columns (Protein.Group, Modified.Sequence, Q.Value, Run, etc.) are missing. Introduced intensity column fallback logic that prefers Precursor.Normalised over Precursor.Quantity and propagates the selected column through report_data and cond_report_data attributes for downstream calculations.
Plotting Configuration `pmultiqc/modules/common/plots/dia.py`, `pmultiqc/modules/common/plots/ms.py`	Added "cpswitch_c_active": False flag to draw_dia_whole_exp_charge and enabled CPSwitch control ("cpswitch": True) in three peak_intensity_distribution, distribution_of_precursor_charges, and peaks_per_ms2 plot configurations with corresponding cpswitch_c_active flags.

Sequence Diagram

sequenceDiagram
    participant Report as DIA Report
    participant Parser as parse_diann_report
    participant ModProc as _process_modifications
    participant RunProc as _process_run_data
    participant QuantTable as _prepare_quant_table_data
    participant Condition as _merge_condition_data
    participant PlotFunc as create_peptides_table/protein_table

    Report->>Parser: Input report data
    Parser->>ModProc: Process modifications
    alt Modifications present
        ModProc-->>Parser: Success (sets Modifications)
    else Modifications missing
        ModProc-->>Parser: Returns False (warning logged)
    end
    
    alt Modifications succeeded
        Parser->>RunProc: Process run data
        RunProc-->>Parser: cal_num_table_data (with validation)
    else Modifications failed
        Parser->>Parser: Use placeholder cal_num_table_data
    end
    
    Parser->>QuantTable: Prepare quantification table
    QuantTable->>QuantTable: Check Precursor.Normalised or fallback to Precursor.Quantity
    QuantTable->>QuantTable: Store intensity_col in attrs
    QuantTable-->>Parser: report_data (with intensity_col in attrs)
    
    Parser->>Condition: Merge condition data
    Condition->>Condition: Read intensity_col from attrs (with fallback)
    Condition->>Condition: Store intensity_col in cond_report_data.attrs
    Condition-->>Parser: cond_report_data
    
    Parser->>PlotFunc: Create quantification plots
    PlotFunc->>PlotFunc: Retrieve intensity_col from attrs
    PlotFunc->>PlotFunc: Calculate Average Intensity using dynamic column
    PlotFunc-->>PlotFunc: Generate plots or skip with warning if data insufficient

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Minor changes #419: Modifies _process_modifications logic in dia_utils.py; this PR's modifications validation changes overlap with UnimodDatabase initialization centralization.
Fix bugs and add heatmap in DIA #324: Extensive DIA plotting refactoring including draw_dia_whole_exp_charge and parse_diann_report wiring; directly related to this PR's CPSwitch flag changes and plot guards.
Major Bug fixing for MAxQuant and DIANN datasets in PRIDE #265: Introduces robustness and validation changes to DIA-NN data processing and plotting (guarding required columns, conditional plotting) similar to this PR's defensive coding approach.

Suggested labels

Review effort 3/5

Suggested reviewers

daichengxin
yueqixuan

Poem

🐰 A validation rabbit hops with care,
Guards each column, checks with flair,
Intensity flows through attrs so neat,
Fallbacks ready when data's not complete,
CPSwitch flags dance in the light! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.25% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The pull request title 'Dev' is vague and generic, providing no meaningful information about the changeset.	Replace with a descriptive title that reflects the main change, such as 'Add robust DIA-NN processing with input validation and safe defaults' or similar.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@pmultiqc/modules/common/dia_utils.py`:
- Around line 160-165: The callers pass pep_plot from _process_diann_statistics
into draw_num_pep_per_protein and will crash if pep_plot is None; add a guard at
the start of draw_num_pep_per_protein to check for pep_plot is None, log or warn
about missing peptide plot data, and return a safe value (e.g., None or empty
figure) instead of accessing pep_plot.dict["data"]. Update
draw_num_pep_per_protein (and any closely related plotting helpers) to
early-return when pep_plot is None so quantms.py and diann.py no longer need to
change their call sites.

coderabbitai · 2026-02-06T11:11:05Z

pmultiqc/modules/common/dia_utils.py

 def _process_diann_statistics(report_data):
    """Process DIA-NN statistics and create peptide plot."""
+    required_cols = ["Protein.Group", "Modified.Sequence"]
+    if not all(col in report_data.columns for col in required_cols):
+        log.warning(f"Missing required columns for statistics: {[c for c in required_cols if c not in report_data.columns]}")
+        return 0, 0, None


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for parse_diann_report usage and context rg -n --type=py -C5 'parse_diann_report' -g '!*test*'

Repository: bigbio/pmultiqc

Length of output: 3955

🏁 Script executed:

#!/bin/bash # Get the full parse_diann_report function implementation sed -n '32,150p' pmultiqc/modules/common/dia_utils.py

Repository: bigbio/pmultiqc

Length of output: 4386

🏁 Script executed:

#!/bin/bash # Get the _process_diann_statistics function sed -n '160,200p' pmultiqc/modules/common/dia_utils.py

Repository: bigbio/pmultiqc

Length of output: 1759

🏁 Script executed:

#!/bin/bash # Check how pep_plot is used in quantms.py after assignment rg -n --type=py 'self.pep_plot' pmultiqc/modules/quantms/quantms.py -A3 -B1

Repository: bigbio/pmultiqc

Length of output: 1837

🏁 Script executed:

#!/bin/bash # Check how pep_plot is used in diann.py after assignment rg -n --type=py 'self.pep_plot' pmultiqc/modules/diann/diann.py -A3 -B1

Repository: bigbio/pmultiqc

Length of output: 420

🏁 Script executed:

#!/bin/bash # Find the function that receives pep_plot rg -n --type=py 'def.*draw_num_pep_per_protein' -A5

Repository: bigbio/pmultiqc

Length of output: 476

🏁 Script executed:

#!/bin/bash # Check if there's any None-checking for pep_plot before calling draw_num_pep_per_protein rg -n --type=py 'draw_num_pep_per_protein' -B5 -A1 pmultiqc/modules/diann/diann.py

Repository: bigbio/pmultiqc

Length of output: 623

🏁 Script executed:

#!/bin/bash # Check the full draw_num_pep_per_protein function to see how pep_plot is used sed -n '1001,1030p' pmultiqc/modules/common/plots/id.py

Repository: bigbio/pmultiqc

Length of output: 901

🏁 Script executed:

#!/bin/bash # Check the same in quantms.py for draw_num_pep_per_protein usage rg -n --type=py 'draw_num_pep_per_protein' -B10 -A5 pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 1802

🏁 Script executed:

#!/bin/bash # Check if there's any None-check around the draw_num_pep_per_protein calls in quantms.py sed -n '350,365p' pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 648

🏁 Script executed:

#!/bin/bash # Check the broader context around draw_num_pep_per_protein in quantms.py sed -n '340,365p' pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 1039

🏁 Script executed:

#!/bin/bash # Check broader context in quantms.py around draw_num_pep_per_protein calls sed -n '320,365p' pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 1888

🏁 Script executed:

#!/bin/bash # Double-check if there's any conditional logic around draw_num_pep_per_protein calls sed -n '355,365p' pmultiqc/modules/quantms/quantms.py && echo "---" && sed -n '185,195p' pmultiqc/modules/diann/diann.py

Repository: bigbio/pmultiqc

Length of output: 688

Callers do not handle pep_plot=None gracefully—this will crash at runtime.

When required columns are missing, _process_diann_statistics returns None for pep_plot. Both quantms.py (line 358) and diann.py (line 187) pass this directly to draw_num_pep_per_protein, which immediately accesses pep_plot.dict["data"] (id.py:1006) without a None check. This causes an AttributeError if column validation fails.

Add a guard in draw_num_pep_per_protein or skip the call when pep_plot is None.

🤖 Prompt for AI Agents

In `@pmultiqc/modules/common/dia_utils.py` around lines 160 - 165, The callers pass pep_plot from _process_diann_statistics into draw_num_pep_per_protein and will crash if pep_plot is None; add a guard at the start of draw_num_pep_per_protein to check for pep_plot is None, log or warn about missing peptide plot data, and return a safe value (e.g., None or empty figure) instead of accessing pep_plot.dict["data"]. Update draw_num_pep_per_protein (and any closely related plotting helpers) to early-return when pep_plot is None so quantms.py and diann.py no longer need to change their call sites.

yueqixuan and others added 6 commits January 23, 2026 09:42

Enhancement

cc1cd48

Merge pull request #597 from yueqixuan/dev

831ca69

Enhance percentage display

Merge pull request #599 from bigbio/claude/fix-bug-eG8gO

58a2ece

Add robust error handling and fallback logic to DIA-NN module

ypriverol changed the title ~~Dev~~ Bug fixed for DIANN Feb 6, 2026

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

ypriverol added 3 commits February 6, 2026 11:15

Update sdrf-pipelines version to 0.0.33

d729028

Update sdrf-pipelines version to 0.0.33

37ec954

Update sdrf-pipelines version to 0.0.33

bd8a441

ypriverol merged commit 1a5bc9d into main Feb 6, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fixed for DIANN #600

Bug fixed for DIANN #600
ypriverol merged 9 commits intomainfrom
dev

ypriverol commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ypriverol commented Feb 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Description

Type of Change

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ypriverol commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 6, 2026 •

edited

Loading